15 research outputs found
Semi-Supervised Recurrent Neural Network for Adverse Drug Reaction Mention Extraction
Social media is an useful platform to share health-related information due to
its vast reach. This makes it a good candidate for public-health monitoring
tasks, specifically for pharmacovigilance. We study the problem of extraction
of Adverse-Drug-Reaction (ADR) mentions from social media, particularly from
twitter. Medical information extraction from social media is challenging,
mainly due to short and highly information nature of text, as compared to more
technical and formal medical reports.
Current methods in ADR mention extraction relies on supervised learning
methods, which suffers from labeled data scarcity problem. The State-of-the-art
method uses deep neural networks, specifically a class of Recurrent Neural
Network (RNN) which are Long-Short-Term-Memory networks (LSTMs)
\cite{hochreiter1997long}. Deep neural networks, due to their large number of
free parameters relies heavily on large annotated corpora for learning the end
task. But in real-world, it is hard to get large labeled data, mainly due to
heavy cost associated with manual annotation. Towards this end, we propose a
novel semi-supervised learning based RNN model, which can leverage unlabeled
data also present in abundance on social media. Through experiments we
demonstrate the effectiveness of our method, achieving state-of-the-art
performance in ADR mention extraction.Comment: Accepted at DTMBIO workshop, CIKM 2017. To appear in BMC
Bioinformatics. Pls cite that versio
Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework
Since the label collecting is prohibitive and time-consuming, unsupervised
methods are preferred in applications such as fraud detection. Meanwhile, such
applications usually require modeling the intrinsic clusters in
high-dimensional data, which usually displays heterogeneous statistical
patterns as the patterns of different clusters may appear in different
dimensions. Existing methods propose to model the data clusters on selected
dimensions, yet globally omitting any dimension may damage the pattern of
certain clusters. To address the above issues, we propose a novel unsupervised
generative framework called FIRD, which utilizes adversarial distributions to
fit and disentangle the heterogeneous statistical patterns. When applying to
discrete spaces, FIRD effectively distinguishes the synchronized fraudsters
from normal users. Besides, FIRD also provides superior performance on anomaly
detection datasets compared with SOTA anomaly detection methods (over 5%
average AUC improvement). The significant experiment results on various
datasets verify that the proposed method can better model the heterogeneous
statistical patterns in high-dimensional data and benefit downstream
applications
Semi-supervised Relation Extraction using EM Algorithm
Relation Extraction is the task of identifying relation between entities in a natural language sentence. We propose a semisupervised approach for relation extraction based on EM algorithm, which uses few relation labeled seed examples and a large number of unlabeled examples (but labeled with entities). We present analysis of how unlabeled data helps in improving the overall accuracy compared to the baseline system using only labeled data. This work therefore shows the efficacy of a sound theoretical framework exploiting an easily obtainable resource named “unlabeled data ” for the problem of relation extraction.
What’s Next? A Recommendation System for Industrial Training
Abstract Continuous training is crucial for creating and maintaining the right skill-profile for the industrial organization’s workforce. There is a tremendous variety in the available trainings within an organization: technical, project management, quality, leadership, domain-specific, soft-skills, etc. Hence it is important to assist the employee in choosing the best trainings, which perfectly suits her background, project needs and career goals. In this paper, we focus on algorithms for training recommendation in an industrial setting. We formalize the problem of next training recommendation, taking into account the employee’s training and work history. We present several new unsupervised sequence mining algorithms to mine the past trainings data from the organization for arriving at personalized next training recommendation. Using the real-life data about trainings of 118,587 employees over 5019 distinct trainings from a large multi-national IT organization, we show that these algorithms outperform several standard recommendation engine algorithms as well as those based on standard sequence mining algorithms